Post-Quantization Gain Correction in Audio Coding

ABSTRACT

A gain adjustment apparatus for use in decoding of audio that has been encoded with separate gain and shape representations includes an accuracy meter configured to estimate an accuracy measure of the shape representation, and to determine a gain correction based on the estimated accuracy measure. An envelope adjuster further included in the apparatus is configured to adjust the gain representation based on the determined gain correction.

RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.14/002,509, filed 30 Aug. 2013, which is a U.S. National PhaseApplication of PCT/SE2011/050899 filed 4 Jul. 2011, which claims benefitof U.S. Provisional Application No. 61/449,230, filed 4 Mar. 2011. Theentire contents of each of the aforementioned applications areincorporated herein by reference.

TECHNICAL FIELD

The present technology relates to gain correction in audio coding basedon quantization schemes where the quantization is divided into a gainrepresentation and a shape representation, so called gain-shape audiocoding, and especially to post-quantization gain correction.

BACKGROUND

Modern telecommunication services are expected to handle many differenttypes of audio signals. While the main audio content is speech signals,there is a desire to handle more general signals such as music andmixtures of music and speech. Although the capacity in telecommunicationnetworks is continuously increasing, it is still of great interest tolimit the required bandwidth per communication channel. In mobilenetworks, smaller transmission bandwidths for each call yields lowerpower consumption in both the mobile device and the base station. Thistranslates to energy and cost saving for the mobile operator, while theend user will experience prolonged battery life and increased talk-time.Further, with less consumed bandwidth per user, the mobile network canservice a larger number of users in parallel.

Today, the dominating compression technology for mobile voice servicesis CELP (Code Excited Linear Prediction), which achieves good audioquality for speech at low bandwidths. It is widely used in deployedcodecs such as AMR (Adaptive MultiRate), AMR-WB (Adaptive MultiRateWideBand) and GSM-EFR (Global System for Mobile communications—EnhancedFullRate). However, for general audio signals such as music the CELPtechnology has poor performance. These signals can often be betterrepresented by using frequency transform based coding, for example, theITU-T codecs G.722.1 [1] and G.719 [2]. However, transform domain codecsgenerally operate at a higher bitrate than the speech codecs. There is agap between the speech and general audio domains in terms of coding, andit is desirable to increase the performance of transform domain codecsat lower bitrates.

Transform domain codecs require a compact representation of thefrequency domain transform coefficients. These representations oftenrely on vector quantization (VQ), where the coefficients are encoded ingroups. Among the various methods for vector quantization is thegain-shape VQ. This approach applies normalization to the vectors beforeencoding the individual coefficients. The normalization factor and thenormalized coefficients are referred to as the gain and the shape of thevector, which may be encoded separately. The gain-shape structure hasmany benefits. By dividing the gain and the shape, the codec can easilybe adapted to varying source input levels by designing the gainquantizer. It is also beneficial from a perceptual perspective where thegain and shape may carry different importance in different frequencyregions. Finally, the gain-shape division simplifies the quantizerdesign and makes it less complex in terms of memory and computationalresources compared to an unconstrained vector quantizer. A functionaloverview of a gain-shape quantizer can be seen in FIG. 1.

If applied to a frequency domain spectrum, the gain-shape structure canbe used to form a spectral envelope and fine structure representation.The sequence of gain values forms the envelope of the spectrum while theshape vectors give the spectral detail. From a perceptual perspective,it is beneficial to partition the spectrum using a non-uniform bandstructure which follows the frequency resolution of the human auditorysystem. This generally means that narrow bandwidths are used for lowfrequencies while larger bandwidths are used for high frequencies. Theperceptual importance of the spectral fine structure varies with thefrequency but is also dependent on the characteristics of the signalitself. Transform coders often employ an auditory model to determine theimportant parts of the fine structure and assign the available resourcesto the most important parts. The spectral envelope is often used asinput to this auditory model. The shape encoder quantizes the shapevectors using the assigned bits. See FIG. 2 for an example of atransform based coding system with an auditory model.

Depending on the accuracy of the shape quantizer, the gain value used toreconstruct the vector may be more or less appropriate. Especially whenthe allocated bits are few, the gain value drifts away from the optimalvalue. One way to solve this is to encode a correcting factor whichaccounts for the gain mismatch after the shape quantization. Anothersolution is to encode the shape first and then compute the optimal gainfactor given the quantized shape.

The solution to encode a gain correction factor after shape quantizationmay consume considerable bitrate. If the rate is already low, this meansmore bits have to be taken elsewhere and may perhaps reduce theavailable bitrate for the fine structure.

To encode the shape before encoding the gain is a better solution, butif the bitrate for the shape quantizer is decided from the quantizedgain value, then the gain and shape quantization would depend on eachother. An iterative solution could likely solve this co-dependency, butit could easily become too complex to run in real-time on a mobiledevice.

SUMMARY

An object is to obtain a gain adjustment in decoding of audio that hasbeen encoded with separate gain and shape representations.

This object is achieved in accordance with the attached claims.

A first aspect involves a gain adjustment method that includes thefollowing steps:

-   -   An accuracy measure of the shape representation is estimated.    -   A gain correction is determined based on the estimated accuracy        measure.    -   The gain representation is adjusted based on the determined gain        correction.

A second aspect involves a gain adjustment apparatus that includes:

-   -   An accuracy meter configured to estimate an accuracy measure of        the shape representation and to determine a gain correction        based on the estimated accuracy measure.    -   An envelope adjuster configured to adjust the gain        representation based on the determined gain correction.

A third aspect involves a decoder including a gain adjustment apparatusin accordance with the second aspect.

A fourth aspect involves a network node including a decoder inaccordance with the third aspect.

The proposed scheme for gain correction improves the perceived qualityof a gain-shape audio coding system. The scheme has low computationalcomplexity and does require few additional bits if any.

BRIEF DESCRIPTION OF THE DRAWINGS

The present technology, together with further objects and advantagesthereof, may best be understood by making reference to the followingdescription taken together with the accompanying drawings, in which:

FIG. 1 illustrates an example gain-shape vector quantization scheme;

FIG. 2 illustrates an example transform domain coding and decodingscheme;

FIG. 3A-C illustrates gain-shape vector quantization in a simplifiedcase;

FIG. 4 illustrates an example transform domain decoder using an accuracymeasure to determine an envelope correction;

FIG. 5A-B illustrates an example result of scaling the synthesis withgain factors when the shape vector is a sparse pulse vector;

FIG. 6A-B illustrates how the largest pulse height can indicate theaccuracy of the shape vector;

FIG. 7 illustrates an example of a rate based attenuation function forembodiment 1;

FIG. 8 illustrates an example of a rate and maximum pulse heightdependent gain adjustment function for embodiment 1;

FIG. 9 illustrates another example of a rate and maximum pulse heightdependent gain adjustment function for embodiment 1;

FIG. 10 illustrates an embodiment of the present technology in thecontext of an MDCT based audio coder and decoder system;

FIG. 11 illustrates an example of a mapping function from the stabilitymeasure to the gain adjustment limitation factor;

FIG. 12 illustrates an example of an ADPCM encoder and decoder systemwith an adaptive step size;

FIG. 13 illustrates an example in the context of a subband ADPCM basedaudio coder and decoder system;

FIG. 14 illustrates an embodiment of the present technology in thecontext of a subband ADPCM based audio coder and decoder system;

FIG. 15 illustrates an example transform domain encoder including asignal classifier;

FIG. 16 illustrates another example transform domain decoder using anaccuracy measure to determine an envelope correction;

FIG. 17 illustrates an embodiment of a gain adjustment apparatus inaccordance with the present technology;

FIG. 18 illustrates an embodiment of gain adjustment in accordance withthe present technology in more detail;

FIG. 19 is a flow chart illustrating the method in accordance with thepresent technology;

FIG. 20 is a flow chart illustrating an embodiment of the method inaccordance with the present technology; and

FIG. 21 illustrates an embodiment of a network in accordance with thepresent technology.

DETAILED DESCRIPTION

In the following description the same reference designations will beused for elements performing the same or similar function.

Before the present technology is described in detail, gain-shape codingwill be illustrated with reference to FIG. 1-3.

FIG. 1 illustrates an example gain-shape vector quantization scheme. Theupper part of the figure illustrates the encoder side. An input vector xis forwarded to a norm calculator 10, which determines the vector norm(gain) g, typically the Euclidian norm. This exact norm is quantized ina norm quantizer 12, and the inverse 1/ĝ of the quantized norm ĝ isforwarded to a multiplier 14 for scaling the input vector x into ashape. The shape is quantized in a shape quantizer 16. Representationsof the quantized gain and shape are forwarded to a bitstream multiplexer(mux) 18. These representations are illustrated by dashed lines toindicate that they may, for example, constitute indices into tables(code books) rather than the actual quantized values.

The lower part of FIG. 1 illustrates the decoder side. A bitstreamdemultiplexer (demux) 20 receives the gain and shape representations.The shape representation is forwarded to a shape dequantizer 22, and thegain representation is forwarded to a gain dequantizer 24. The obtainedgain ĝ is forwarded to a multiplier 26, where it scales the obtainedshape, which gives the reconstructed vector {circumflex over (x)}.

FIG. 2 illustrates an example transform domain coding and decodingscheme. The upper part of the figure illustrates the encoder side. Aninput signal is forwarded to a frequency transformer 30, for example,based on the Modified Discrete Cosine Transform (MDCT), to produce thefrequency transform X. The frequency transform X is forwarded to anenvelope calculator 32, which determines the energy E(b) of eachfrequency band b. These energies are quantized into energies Ê(b) in anenvelope quantizer 34. The quantized energies Ê(b) are forwarded to anenvelope normalizer 36, which scales the coefficients of frequency bandb of the transform X with the inverse of the corresponding quantizedenergy Ê(b) of the envelope. The resulting scaled shapes are forwardedto a fine structure quantizer 38. The quantized energies Ê(b) are alsoforwarded to a bit allocator 40, which allocates bits for fine structurequantization to each frequency band b. As noted above, the bitallocation R(b) may be based on a model of the human auditory system.Representations of the quantized gains Ê(b) and corresponding quantizedshapes are forwarded to bitstream multiplexer 18.

The lower part of FIG. 2 illustrates the decoder side. The bitstreamdemultiplexer 20 receives the gain and shape representations. The gainrepresentations are forwarded to an envelope dequantizer 42. Thegenerated envelope energies Ê(b) are forwarded to a bit allocator 44,which determines the bit allocation R(b) of the received shapes. Theshape representations are forwarded to a fine structure dequantizer 46,which is controlled by the bit allocation R(b). The decoded shapes areforwarded to an envelope shaper 48, which scales them with thecorresponding envelope energies Ê(b) to form a reconstructed frequencytransform. This transform is forwarded to an inverse frequencytransformer 50, for example, based on the Inverse Modified DiscreteCosine Transform (IMDCT), which produces an output signal representingsynthesized audio.

FIG. 3A-C illustrates gain-shape vector quantization described above ina simplified case where the frequency band b is represented by the2-dimensional vector X(b) in FIG. 3A. This case is simple enough to beillustrated in a drawing, but also general enough to illustrate theproblem with gain-shape quantization (in practice the vectors typicallyhave 8 or more dimensions). The right hand side of FIG. 3A illustratesan exact gain-shape representation of the vector X(b) with a gain E(b)and a shape (unit length vector) N′(b).

However, as illustrated in FIG. 3B, the exact gain E(b) is encoded intoa quantized gain Ê(b) on the encoder side. Since the inverse of thequantized gain Ê(b) is used for scaling of the vector X(b), theresulting scaled vector N(b) will point in the correct direction, butwill not necessarily be of unit length. During shape quantization, thescaled vector N(b) is quantized into the quantized shape {circumflexover (N)}(b). In this case, the quantization is based on a pulse codingscheme [3], which constructs the shape (or direction) from a sum ofsigned integer pulses. The pulses may be added on top of each other foreach dimension. This means that the allowed shape quantization positionsare represented by the large dots in the rectangular grids illustratedin FIG. 3B-C. The result is that the quantized shape {circumflex over(N)}(b) will in general not coincide with the shape (direction) of N(b)(and N′(b)).

FIG. 3C illustrates that the accuracy of the shape quantization dependson the allocated bits R(b), or equivalently the total number of pulsesavailable for shape quantization. In the left part of FIG. 3C the shapequantization is based on 8 pulses, whereas the shape quantization in theright part uses only 3 pulses (the example in FIG. 3B uses 4 pulses).

Thus, it is appreciated that depending on the accuracy of the shapequantizer, the gain value Ê(b) used to reconstruct the vector X(b) onthe decoder side may be more or less appropriate. In accordance with thepresent technology, a gain correction can be based on an accuracymeasure of the quantized shape.

The accuracy measure used to correct the gain may be derived fromparameters already available in the decoder, but it may also depend onadditional parameters designated for the accuracy measure. Typically,the parameters would include the number of allocated bits for the shapevector and the shape vector itself, but it may also include the gainvalue associated with the shape vector and pre-stored statistics aboutthe signals that are typical for the encoding and decoding system. Anoverview of a system incorporating an accuracy measure and gaincorrection or adjustment is shown in FIG. 4.

FIG. 4 illustrates an example transform domain decoder 300 using anaccuracy measure to determine an envelope correction. In order to avoidcluttering of the drawing, only the decoder side is illustrated. Theencoder side may be implemented as in FIG. 2. The new feature is a gainadjustment apparatus 60. The gain adjustment apparatus 60 includes anaccuracy meter 62 configured to estimate an accuracy measure A(b) of theshape representation {circumflex over (N)}(b), and to determine a gaincorrection g_(c)(b) based on the estimated accuracy measure A(b). Italso includes an envelope adjuster 64 configured to adjust the gainrepresentation Ê(b) based on the determined gain correction.

As indicated above, the gain correction may in some embodiments beperformed without spending additional bits. This is done by estimatingthe gain correction from parameters already available in the decoder.This process can be described as an estimation of the accuracy of theencoded shape. Typically, this estimation includes deriving the accuracymeasure A(b) from shape quantization characteristics indicating theresolution of the shape quantization.

Embodiment 1

In one embodiment, the present technology is used in an audioencoder/decoder system. The system is transform based and the transformused is the Modified Discrete Cosine Transform (MDCT) using sinusoidalwindows with 50% overlap. However, it is understood that any transformsuitable for transform coding may be used together with appropriatesegmentation and windowing.

Encoder of Embodiment 1

The input audio is extracted into frames using 50% overlap and windowedwith a symmetric sinusoidal window. Each windowed frame is thentransformed to an MDCT spectrum X. The spectrum is partitioned intosubbands for processing, where the subband widths are non-uniform. Thespectral coefficients of frame m belonging to band b are denoted X(b,m)and have the bandwidth BW(b). Since most encoder and decoder steps canbe described within one frame, we omit the frame index and just use thenotation X(b). The bandwidths should preferably increase with increasingfrequency to comply with the frequency resolution of the human auditorysystem. The root-mean-square (RMS) value of each band is used as anormalization factor and is denoted E(b):

$\begin{matrix}{{E(b)} = \sqrt{\frac{{X(b)}^{T}{X(b)}}{{BW}(b)}}} & (1)\end{matrix}$

where X(b)^(T) denotes the transpose of X(b).

The RMS value can be seen as the energy value per coefficient. Thesequence of normalization factors E(b) for b=1, 2, . . . , N_(bands)forms the envelope of the MDCT spectrum, where N_(bands) denotes thenumber of bands. Next, the sequence is quantized in order to betransmitted to the decoder. To ensure that the normalization can bereversed in the decoder, the quantized envelope E(b) is obtained. Inthis example embodiment the envelope coefficients are scalar quantizedin log domain using a step size of 3 dB and the quantizer indices aredifferentially encoded using Huffman coding. The quantized envelope isused for normalization of the spectral bands, i.e.:

$\begin{matrix}{{N(b)} = {\frac{1}{\hat{E}(b)}{X(b)}}} & (2)\end{matrix}$

Note that if the non-quantized envelope E(b) is used for normalization,the shape would have RMS=1, i.e.:

$\begin{matrix}{{N^{\prime}(b)} = {\left. {\frac{1}{E(b)}{X(b)}}\Rightarrow\sqrt{\frac{{N^{\prime}(b)}^{T}{N^{\prime}(b)}}{{BW}(b)}} \right. = 1}} & (3)\end{matrix}$

By using the quantized envelope Ê(b), the shape vector will have an RMSvalue close to 1. This feature will be used in the decoder to create anapproximation of the gain value.

The union of the normalized shape vectors N(b) forms the fine structureof the MDCT spectrum. The quantized envelope is used to produce a bitallocation R(b) for encoding of the normalized shape vectors N(b). Thebit allocation algorithm preferably uses an auditory model to distributethe bits to the perceptually most relevant parts. Any quantizer schememay be used for encoding the shape vector. Common for all is that theymay be designed under the assumption that the input is normalized, whichsimplifies quantizer design. In this embodiment the shape quantizationis done using a pulse coding scheme which constructs the synthesis shapefrom a sum of signed integer pulses [3]. The pulses may be added on topof each other to form pulses of different height. In this embodiment thebit allocation R(b) denotes the number of pulses assigned to band b.

The quantizer indices from the envelope quantization and shapequantization are multiplexed into a bitstream to be stored ortransmitted to a decoder.

Decoder of Embodiment 1

The decoder demultiplexes the indices from the bitstream and forwardsthe relevant indices to each decoding module. First, the quantizedenvelope Ê(b) is obtained. Next, the fine structure bit allocation isderived from the quantized envelope using a bit allocation identical theone used in the encoder. The shape vectors {circumflex over (N)}(b) ofthe fine structure are decoded using the indices and the obtained bitallocation R(b).

Now, before scaling the decoded fine structure with the envelope,additional gain correction factors are determined. First, the RMSmatching gain is obtained as:

$\begin{matrix}{{g_{RMS}(b)} = \sqrt{\frac{{BW}(b)}{{\hat{N}(b)}^{T}{\hat{N}(b)}}}} & (4)\end{matrix}$

The g_(RMS)(b) factor is a scaling factor that normalizes the RMS valueto 1, i.e.:

$\begin{matrix}{\sqrt{\frac{\left( {{g_{RMS}(b)}{\hat{N}(b)}} \right)^{T}\left( {{g_{RMS}(b)}{\hat{N}(b)}} \right)}{{BW}(b)}} = 1} & (5)\end{matrix}$

In this embodiment we seek to minimize the mean squared error (MSE) ofthe synthesis:

$\begin{matrix}{{g_{MSE}(b)} = {\underset{g}{{\arg \mspace{11mu} \min}\;}{{{N(b)} - {g \cdot {\hat{N}(b)}}}}}} & (6)\end{matrix}$

with the solution

$\begin{matrix}{{g_{MSE}(b)} = \frac{{\hat{N}(b)}^{T}{N(b)}}{{N(b)}^{T}{N(b)}}} & (7)\end{matrix}$

Since g_(MSE)(b) depends on the input shape N(b), it is not known in thedecoder. In this embodiment the impact is estimated by using an accuracymeasure. The ratio of these gains is defined as a gain correction factorg_(c)(b):

$\begin{matrix}{{g_{c}(b)} = \frac{g_{MSE}(b)}{g_{RMS}(b)}} & (8)\end{matrix}$

When the accuracy of the shape quantization is good, the correctionfactor is close to 1, i.e.:

{circumflex over (N)}(b)→N(b)

g _(c)(b)→1  (9)

However, when the accuracy of {circumflex over (N)}(b) is low,g_(MSE)(b) and g_(RMS)(b) will diverge. In this embodiment, where theshape is encoded using a pulse coding scheme, a low rate will make theshape vector sparse and g_(RMS)(b) will give an overestimate of theappropriate gain in terms of MSE. For this case g_(c)(b) should be lowerthan 1 to compensate for the overshoot. See FIG. 5A-B for an exampleillustration of the low rate pulse shape case. FIG. 5A-B illustrates anexample of scaling the synthesis with g_(MSE) (FIG. 5B) and g_(RMS)(FIG. 5A) gain factors when the shape vector is a sparse pulse vector.The g_(RMS) scaling gives pulses that are too high in an MSE sense.

On the other hand, a peaky or sparse target signal can be wellrepresented with a pulse shape. While the sparseness of the input signalmay not be known in the synthesis stage, the sparseness of the synthesisshape may serve as an indicator of the accuracy of the synthesized shapevector. One way to measure the sparseness of the synthesis shape is theheight of the maximum peak in the shape. The reasoning behind this isthat a sparse input signal is more likely to generate high peaks in thesynthesis shape. See FIG. 7A-B for an illustration of how the peakheight can indicate the accuracy of two equal rate pulse vectors.

In FIG. 7A there are 5 pulses available (R(b)=5) to represent the dashedshape. Since the shape is rather constant, the coding generated 5distributed pulses of equal height 1, i.e. P_(max)=1. In FIG. 7B thereare also 5 pulses available to represent the dashed shape. However, inthis case the shape is peaky or sparse, and the largest peak isrepresented by 3 pulses on top of each other, i.e. P_(max)=3. Thisindicates that the gain correction g_(c)(b) depends on an estimatedsparseness P_(max) of the quantized shape.

As noted above, the input shape N(b) is not known by the decoder. Sinceg_(MSE)(b) depends on the input shape N(b), this means that the gaincorrection or compensation g_(c)(b) can in practice not be based on theideal equation (8). In this embodiment the gain correction g_(c)(b) isinstead decided based on the bit-rate in terms of the number of pulsesR(b), the height of the largest pulse in the shape vector p_(max)(b) andthe frequency band b, i.e.:

g _(c)(b)=f(R(b),p _(max)(b),b)  (10)

It has been observed that the lower rates generally require anattenuation of the gain to minimize the MSE. The rate dependency may beimplemented as a lookup table t(R(b)) which is trained on relevant audiosignal data. An example lookup table can be seen in FIG. 7. Since theshape vectors in this embodiment have different widths, the rate maypreferably be expressed as number of pulses per sample. In this way thesame rate dependent attenuation can be used for all bandwidths. Analternative solution, which is used in this embodiment, is to use a stepsize T in the table depending on the width of the band. Here, we use 4different bandwidths in 4 different groups and hence require 4 stepsizes. An example of step sizes is found in Table 1. Using the stepsize, the lookup value is obtained by using a rounding operationt(└R(b)·T┘), where └ ┘ represents rounding to the closest integer.

TABLE 1 Band group Bandwidth Step size T 1 8 4 2 16 4/3 3 24 2 4 34 1Another example lookup table is given in Table 2.

TABLE 2 Band group Bandwidth Step size T 1 8 4 2 16 4/3 3 24 2 4 32 1

The estimated sparseness can be implemented as another lookup tableu(R(b), p_(max)(b)) based on both the number of pulses R(b) and theheight of the maximum pulse p_(max)(b). An example lookup table is shownin FIG. 8. The lookup table u serves as an accuracy measure A(b) forband b, i.e.:

A(b)=u(R(b),p _(max)(b))  (11)

It was noted that the approximation of g_(MSE) was more suitable for thelower frequency range from a perceptual perspective. For the higherfrequencies the fine structure becomes less perceptually important andthe matching of the energy or RMS value becomes vital. For this reason,the gain attenuation may be applied only below a certain band numberb_(THR). In this case the gain correction g_(c)(b) will have an explicitdependence on the frequency band b. The resulting gain correctionfunction can in this case be defined as:

$\begin{matrix}{{g_{c}(b)} = \left\{ \begin{matrix}{{{t\left( {R(b)} \right)} \cdot {A(b)}},} & {b < b_{THR}} \\{1,} & {otherwise}\end{matrix} \right.} & (12)\end{matrix}$

The description up to this point may also be used to describe theessential features of the example embodiment of FIG. 4. Thus, in theembodiment of FIG. 4, the final synthesis {circumflex over (X)}(b) iscalculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{}}{{g_{c}(b)}{g_{RMS}(b)}\hat{E}(n)}{\hat{N}(b)}}} & (13)\end{matrix}$

As an alternative the function u(R(b), p_(max)(b)) may be implemented asa linear function of the maximum pulse height p_(max) and the allocatedbit rate R(b), for example as:

u(R(b),p _(max)(b))=k·(p _(max)(b)−R(b))+1  (14)

where the inclination k is determined by:

$\begin{matrix}{{k = \frac{1 - \left( {a_{\min} + {{{R(b)} \cdot \Delta}\; a}} \right)}{{R(b)} - 1}}{{\Delta \; a} = {\left( {a_{\max} - a_{\min}} \right)/{R(b)}}}{a_{\max} = {1 - \frac{1 - a_{\min}}{{R(b)} - 1}}}} & (15)\end{matrix}$

The function depends on the tuning parameter a_(min) which gives theinitial attenuation factor for R(b)=1 and p_(max)(b)=1. The function isillustrated in FIG. 9, with the tuning parameter a_(min)=0.41. Typicallyu_(max)ε[0.7, 1.4] and u_(min)ε[0, u_(max)]. In equation (14) u islinear in the difference between p_(max)(b) and R(b). Anotherpossibility is to have different inclination factors for p_(max)(b) andR(b).

The bitrate for a given band may change drastically for a given bandbetween adjacent frames. This may lead to fast variations of the gaincorrection. Such variations are especially critical when the envelope isfairly stable, i.e. the total changes between frames are quite small.This often happens for music signals which typically have more stableenergy envelopes. To avoid that the gain attenuation introducesinstability, an additional adaptation may be added. An overview of suchan embodiment is given in FIG. 10, in which a stability meter 66 hasbeen added to the gain adjustment apparatus 60 in the decoder 300.

The adaptation can, for example, be based on a stability measure of theenvelope Ê(b). An example of such a measure is to compute the squaredEuclidian distance between adjacent log₂ envelope vectors:

$\begin{matrix}{{\Delta \; {E(m)}} = {\frac{1}{N_{bands}}{\sum\limits_{b = 0}^{N_{bands} - 1}\; \left( {{\log_{2}{\hat{E}\left( {b,m} \right)}} - {\log_{2}{\hat{E}\left( {b,{m - 1}} \right)}}} \right)^{2}}}} & (16)\end{matrix}$

Here, ΔE(m) denotes the squared Euclidian distance between the envelopevectors for frame m and frame m−1. The stability measure may also belowpass filtered to have a smoother adaptation:

Δ{tilde over (E)}(m)=αΔE(m)+(1−α)ΔE(m−1)  (17)

A suitable value for the forgetting factor α may be 0.1. The smoothenedstability measure may then be used to create a limitation of theattenuation using, for example, a sigmoid function such as:

$\begin{matrix}{{g_{\min} = \frac{1}{1 + e^{{C_{1}{({{\Delta \; {\overset{\sim}{E}{(m)}}} - C_{2}})}} - C_{3}}}},} & (18)\end{matrix}$

where the parameters may be set to C₁=6, C₂=2 and C₃=1.9. It should benoted that these parameters are to be seen as examples, while the actualvalues may be chosen with more freedom. For instance:

C ₁ε[1,10]

C ₂ε[1,4]

C ₃ε[−5,10]

FIG. 11 illustrates an example of a mapping function from the stabilitymeasure Δ{tilde over (E)}(m) to the gain adjustment limitation factorg_(min). The above expression for g_(min) is preferably implemented as alookup table or with a simple step function, such as:

$\begin{matrix}{g_{\min} = \left\{ \begin{matrix}{1,} & {{\Delta \; {\overset{\sim}{E}(m)}} < {{C_{3}/C_{1}} + C_{2}}} \\{0,} & {{\Delta \; {\overset{\sim}{E}(m)}} \geq {{C_{3}/C_{1}} + C_{2}}}\end{matrix} \right.} & (19)\end{matrix}$

The attenuation limitation variable g_(min)ε[0,1] may be used to createa stability adapted gain modification {tilde over (g)}_(c)(b) as:

{tilde over (g)} _(c)(b)=max(g _(c)(b),g _(min))  (20)

After the estimation of the gain, the final synthesis {circumflex over(X)}(b) is calculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{}}{{{\overset{\sim}{g}}_{c}(b)}{g_{RMS}(b)}{\hat{E}(n)}}{\hat{N}(b)}}} & (21)\end{matrix}$

In the described variations of embodiment 1 the union of the synthesizedvectors {circumflex over (X)}(b) forms the synthesized spectrum{circumflex over (X)}, which is further processed using the inverse MDCTtransform, windowed with the symmetric sine window and added to theoutput synthesis using the overlap-and-add strategy.

Embodiment 2

In another example embodiment, the shape is quantized using a QMF(Quadrature Mirror Filter) filter bank and an ADPCM (AdaptiveDifferential Pulse-Code Modulation) scheme for shape quantization. Anexample of a subband ADPCM scheme is the ITU-T G.722 [4]. The inputaudio signal is preferably processed in segments. An example ADPCMscheme is shown in FIG. 12, with an adaptive step size S. Here, theadaptive step size of the shape quantizer serves as an accuracy measurethat is already present in the decoder and does not require additionalsignaling. However, the quantization step size needs to be extractedfrom the parameters used by the decoding process and not from thesynthesized shape itself. An overview of this embodiment is shown inFIG. 14. However, before this embodiment is described in detail, anexample ADPCM scheme based on a QMF filter bank will be described withreference to FIGS. 12 and 13.

FIG. 12 illustrates an example of an ADPCM encoder and decoder systemwith an adaptive quantization step size. An ADPCM quantizer 70 includesan adder 72, which receives an input signal and subtracts an estimate ofthe previous input signal to form an error signal e. The error signal isquantized in a quantizer 74, the output of which is forwarded to thebitstream multiplexer 18, and also to a step size calculator 76 and adequantizer 78. The step size calculator 76 adapts the quantization stepsize S to obtain an acceptable error. The quantization step size S isforwarded to the bitstream multiplexer 18, and also controls thequantizer 74 and the dequantizer 78. The dequantizer 78 outputs an errorestimate ê to an adder 80. The other input of the adder 80 receives anestimate of the input signal which has been delayed by a delay element82. This forms a current estimate of the input signal, which isforwarded to the delay element 82. The delayed signal is also forwardedto the step size calculator 76 and to (with a sign change) the adder 72to form the error signal e.

An ADPCM dequantizer 90 includes a step size decoder 92, which decodesthe received quantization step size S and forwards it to a dequantizer94. The dequantizer 94 decodes the error estimate e, which is forwardedto an adder 98, the other input of which receives the output signal fromthe adder delayed by a delay element 96.

FIG. 13 illustrates an example in the context of a subband ADPCM basedaudio encoder and decoder system. The encoder side is similar to theencoder side of the embodiment of FIG. 2. The essential differences arethat the frequency transformer 30 has been replaced by a QMF (QuadratureMirror Filter) analysis filter bank 100, and that fine structurequantizer 38 has been replaced by an ADPCM quantizer, such as thequantizer 70 in FIG. 12. The decoder side is similar to the decoder sideof the embodiment of FIG. 2. The essential differences are that theinverse frequency transformer 50 has been replaced by a QMF synthesisfilter bank 102, and that fine structure dequantizer 46 has beenreplaced by an ADPCM dequantizer, such as the dequantizer 90 in FIG. 12.

FIG. 14 illustrates an embodiment of the present technology in thecontext of a subband ADPCM based audio coder and decoder system. Inorder to avoid cluttering of the drawing, only the decoder side 300 isillustrated. The encoder side may be implemented as in FIG. 13.

Encoder of Embodiment 2

The encoder applies the QMF filter bank to obtain the subband signals.The RMS values of each subband signal are calculated and the subbandsignals are normalized. The envelope E(b), subband bit allocation R(b)and normalized shape vectors N(b) are obtained as in embodiment 1. Eachnormalized subband is fed to the ADPCM quantizer. In this embodiment theADPCM operates in a forward adaptive fashion, and determines a scalingstep S(b) to be used for subband b. The scaling step is chosen tominimize the MSE across the subband frame. In this embodiment the stepis chosen by trying all possible steps and selecting the one which givesthe minimum MSE:

$\begin{matrix}{{S(b)} = {\min\limits_{s}{\frac{1}{{BW}(b)}\left( {{N(b)} - {Q\left( {{N(b)},s} \right)}} \right)^{T}\left( {{N(b)} - {Q\left( {{N(b)},s} \right)}} \right)}}} & (22)\end{matrix}$

where Q(x,s) is the ADPCM quantizing function of the variable x using astep size of s. The selected step size may be used to generate thequantized shape:

{circumflex over (N)}(b)=Q(N(b),S(b))  (23)

The quantizer indices from the envelope quantization and shapequantization are multiplexed into a bitstream to be stored ortransmitted to a decoder.

Decoder of Embodiment 2

The decoder demultiplexes the indices from the bitstream and forwardsthe relevant indices to each decoding module. The quantized envelopeÊ(b) and the bit allocation R(b) are obtained as in embodiment 1. Thesynthesized shape vectors {circumflex over (N)}(b) are obtained from theADPCM decoder or dequantizer together with the adaptive step sizes S(b).The step sizes indicate an accuracy of the quantized shape vector, wherea smaller step size corresponds to a higher accuracy and vice versa. Onepossible implementation is to make the accuracy A(b) inverselyproportional to the step size using a proportionality factor γ:

$\begin{matrix}{{A(b)} = {\gamma \frac{1}{S(b)}}} & (24)\end{matrix}$

where γ should be set to achieve the desired relation. One possiblechoice is γ=S_(min) where S_(min) is the minimum step size, which givesaccuracy 1 for S(b)=S_(min).

The gain correction factor g_(c) may be obtained using a mappingfunction:

g _(c)(b)=h(R(b),b)·A(b)  (25)

The mapping function h may be implemented as a lookup table based on therate R(b) and frequency band b. This table may be defined by clusteringthe optimal gain correction values g_(MSE)/g_(RMS) by these parametersand computing the table entry by averaging the optimal gain correctionvalues for each cluster.

After the estimation of the gain correction, the subband synthesis{circumflex over (X)}(b) is calculated as:

$\begin{matrix}{{\hat{X}(b)} = {\underset{\underset{\overset{\sim}{E}{(n)}}{}}{{g_{c}(b)}{g_{RMS}(b)}{\hat{E}(n)}}{\hat{N}(b)}}} & (26)\end{matrix}$

The output audio frame is obtained by applying the synthesis QMF filterbank to the subbands.

In the example embodiment illustrated in FIG. 14 the accuracy meter 62in the gain adjustment apparatus 60 receives the not yet decodedquantization step size S (b) directly from the received bitstream. Analternative, as noted above, is to decode it in the ADPCM dequantizer 90and forward it in decoded form to the accuracy meter 62.

Further Alternatives

The accuracy measure could be complemented with a signal class parameterderived in the encoder. This may for instance be a speech/musicdiscriminator or a background noise level estimator. An overview of asystem incorporating a signal classifier is shown in FIG. 15-16. Theencoder side in FIG. 15 is similar to the encoder side in FIG. 2, buthas been provided with a signal classifier 104. The decoder side 300 inFIG. 16 is similar to the decoder side in FIG. 4, but has been providedwith a further signal class input to the accuracy meter 62.

The signal class could be incorporated in the gain correction forinstance by having a class dependent adaptation. If we assume the signalclasses are speech or music corresponding to the values C=1 and C=0respectively, we can constrain the gain adjustment to be effective onlyduring speech. i.e.:

$\begin{matrix}{{g_{c}(b)} = \left\{ \begin{matrix}{{{t\left( {R(b)} \right)} \cdot {A(b)}},} & {{b < {b_{THR}\bigwedge C}} = 1} \\{1,} & {otherwise}\end{matrix} \right.} & (27)\end{matrix}$

In another alternative embodiment the system can act as a predictortogether with a partially coded gain correction or compensation. In thisembodiment the accuracy measure is used to improve the prediction of thegain correction or compensation such that the remaining gain error maybe coded with fewer bits.

When creating the gain correction or compensation factor g_(c) one mightwant to do a trade-off between matching the RMS value or energy andminimizing the MSE. In some cases matching the energy becomes moreimportant than an accurate waveform. This is for instance true forhigher frequencies. To accommodate this, the final gain correction may,in a further embodiment, be formed by using a weighted sum of thedifferent gain values:

$\begin{matrix}{g_{c}^{\prime} = {\frac{{\beta \; g_{RMS}} + {\left( {1 - \beta} \right)g_{MSE}}}{g_{RMS}} = {{\beta + {\left( {1 - \beta} \right)\frac{g_{MSE}}{g_{RMS}}}} = {\beta + {\left( {1 - \beta} \right)g_{c}}}}}} & (28)\end{matrix}$

where g_(c) is the gain correction obtained in accordance with one ofthe approaches described above. The weighting factor β can be madeadaptive to e.g. the frequency, bitrate or signal type.

The steps, functions, procedures and/or blocks described herein may beimplemented in hardware using any conventional technology, such asdiscrete circuit or integrated circuit technology, including bothgeneral-purpose electronic circuitry and application-specific circuitry.

Alternatively, at least some of the steps, functions, procedures and/orblocks described herein may be implemented in software for execution bya suitable processing device, such as a micro processor, Digital SignalProcessor (DSP) and/or any suitable programmable logic device, such as aField Programmable Gate Array (FPGA) device.

It should also be understood that it may be possible to reuse thegeneral processing capabilities of the decoder. This may, for example,be done by reprogramming of the existing software or by adding newsoftware components.

FIG. 17 illustrates an embodiment of a gain adjustment apparatus 60 inaccordance with the present technology. This embodiment is based on aprocessor 110, for example a micro processor, which executes a softwarecomponent 120 for estimating the accuracy measure, a software component130 for determining gain the correction, and a soft-ware component 140for adjusting the gain representation. These software components arestored in memory 150. The processor 110 communicates with the memoryover a system bus. The parameters {circumflex over (N)}(b), R(b), Ê(b)are received by an input/output (I/O) controller 160 controlling an I/Obus, to which the processor 110 and the memory 150 are connected. Inthis embodiment the parameters received by the I/O controller 160 arestored in the memory 150, where they are processed by the softwarecomponents. Software components 120, 130 may implement the functionalityof block 62 in the embodiments described above. Software component 140may implement the functionality of block 64 in the embodiments describedabove. The adjusted gain representation {tilde over (E)}(b) obtainedfrom software component 140 is outputted from the memory 150 by the I/Ocontroller 160 over the I/O bus.

FIG. 18 illustrates an embodiment of gain adjustment in accordance withthe present technology in more detail. An attenuation estimator 200 isconfigured to use the received bit allocation R(b) to determine a gainattenuation t(R(b)). The attenuation estimator 200 may, for example, beimplemented as a lookup table or in software based on a linear equationsuch as equation (14) above. The bit allocation R(b) is also forwardedto a shape accuracy estimator 202, which also receives an estimatedsparseness p_(max)(b) of the quantized shape, for example represented bythe height of the highest pulse in the shape representation {circumflexover (N)}(b). The shape accuracy estimator 202 may, for example, beimplemented as a lookup table. The estimated attenuation t(R(b)) and theestimated shape accuracy A(b) are multiplied in a multiplier 204. In oneembodiment this product t(R(b))·A(b) directly forms the gain correctiong_(c)(b). In another embodiment the gain correction g_(c)(b) is formedin accordance with equation (12) above. This requires a switch 206controlled by a comparator 208, which determines whether the frequencyband b is less than a frequency limit b_(THR). If this is the case, theng_(c)(b) is equal to t(R(b))·A(b). Otherwise g_(c)(b) is set to 1. Thegain correction g_(c)(b) is forwarded to another multiplier 210, theother input of which receives the RMS matching gain g_(RMA)(b). The RMSmatching gain g_(RMA)(b) is determined by an RMS matching gaincalculator 212 based on the received shape representation {circumflexover (N)}(b) and corresponding bandwidth BW(b), see equation (4) above.The resulting product is forwarded to another multiplier 214, which alsoreceives the shape representation {circumflex over (N)}(b) and the gainrepresentation Ê(b), and forms the synthesis {circumflex over (X)}(b).

The stability detection described with reference to FIG. 10 may beincorporated into embodiment 2 as well as the other embodimentsdescribed above.

FIG. 19 is a flow chart illustrating the method in accordance with thepresent technology. Step S1 estimates an accuracy measure A(b) of theshape representation {circumflex over (N)}(b). The accuracy measure may,for example, be derived from shape quantization characteristics, such asR(b), S(b), indicating the resolution of the shape quantization. Step S2determines a gain correction, such as g_(c)(b), {tilde over (g)}_(c)(b),g_(c)′(b), based on the estimated accuracy measure. Step S3 adjusts thegain representation Ê(b) based on the determined gain correction.

FIG. 20 is a flow chart illustrating an embodiment of the method inaccordance with the present technology, in which the shape has beenencoded using a pulse coding scheme and the gain correction depends onan estimated sparseness p_(max)(b) of the quantized shape. It is assumedthat an accuracy measure has already been determined a step S1 (FIG.19). Step S4 estimates a gain attenuation that depends on allocated bitrate. Step S5 determines a gain correction based on the estimatedaccuracy measure and the estimated gain attenuation. Thereafter theprocedure proceeds to step S3 (FIG. 19) to adjust the gainrepresentation.

FIG. 21 illustrates an embodiment of a network in accordance with thepresent technology. It includes a decoder 300 provided with a gainadjustment apparatus in accordance with the present technology. Thisembodiment illustrates a radio terminal, but other network nodes arealso feasible. For example, if voice over IP (Internet Protocol) is usedin the network, the nodes may comprise computers.

In the network node in FIG. 21 an antenna 302 receives a coded audiosignal. A radio unit 304 transforms this signal into audio parameters,which are forwarded to the decoder 300 for generating a digital audiosignal, as described with reference to the various embodiments above.The digital audio signal is then D/A converted and amplified in a unit306 and finally forwarded to a loudspeaker 308.

Although the description above focuses on transform based audio coding,the same principles may also be applied to time domain audio coding withseparate gain and shape representations, for example CELP coding.

It will be understood by those skilled in the art that variousmodifications and changes may be made to the present technology withoutdeparture from the scope thereof, which is defined by the appendedclaims.

Abbreviations

ADPCM Adaptive Differential Pulse-Code Modulation

AMR Adaptive MultiRate

AMR-WB Adaptive MultiRate WideBand

CELP Code Excited Linear Prediction

GSM-EFR Global System for Mobile communications—Enhanced FullRate

DSP Digital Signal Processor

FPGA Field Programmable Gate Array

IP Internet Protocol

MDCT Modified Discrete Cosine Transform

MSE Mean Squared Error

QMF Quadrature Mirror Filter

RMS Root-Mean-Square

VQ Vector Quantization

REFERENCES

-   [1] “ITU-T G.722.1 ANNEX C: A NEW LOW-COMPLEXITY 14 KHZ AUDIO CODING    STANDARD”, ICASSP 2006.-   [2] “ITU-T G.719: A NEW LOW-COMPLEXITY FULL-BAND (20 KHZ) AUDIO    CODING STANDARD FOR HIGH-QUALITY CONVERSATIONAL APPLICATIONS”, WASPA    2009.-   [3] U. Mittal, J. Ashley, E. Cruz-Zeno, “Low Complexity Factorial    Pulse Coding of MDCT Coefficients using Approximation of    Combinatorial Functions,” ICASSP 2007.-   [4] “7 kHz Audio Coding Within 64 kbit/s”, [G.722], IEEE JOURNAL ON    SELECTED AREAS IN COMMUNICATIONS, 1988.

What is claimed is:
 1. A gain adjustment method in decoding an audiosignal that has been encoded with separate gain and shaperepresentations, said method comprising: estimating an accuracy measureof the shape representation for a frequency band of the audio signal,wherein the shape representation encodes a shape vector comprisingcoefficients of the audio signal for the frequency band, and wherein theshape vector has been encoded using a pulse vector coding scheme wherepulses may be added on top of each other to form pulses of differentheight, and the accuracy measure is based on the number of pulses usedfor encoding the shape vector and a height of the maximum pulse in theshape representation; determining, based on the estimated accuracymeasure, a gain correction; and adjusting the gain representation forthe frequency band based on the determined gain correction.
 2. Themethod of claim 1, further comprising determining the gain correction independence on a position of the frequency band relative to one or moredefined frequency thresholds.
 3. The method of claim 1, furthercomprising: estimating a gain attenuation that depends on an allocatedbit rate used for the shape representation; determining the gaincorrection based on the estimated accuracy measure and the estimatedgain attenuation.
 4. The method of claim 3, further comprisingestimating the gain attenuation from a lookup table that associatesdifferent gain attenuations with different allocated bit rates or rangesof allocated bit rates.
 5. The method of claim 3, further comprisingestimating the accuracy measure from a lookup table that associatesdifferent accuracy measures with different numbers of pulses and/ordifferent heights of the maximum pulse, as used for the shaperepresentation.
 6. The method of claim 3, further comprising estimatingthe accuracy measure from a linear function of the maximum pulse heightand the allocated bit rate.
 7. The method of claim 1, further comprisingadapting the gain correction to a determined audio signal class of theaudio signal.
 8. A gain adjustment apparatus for use in decoding anaudio signal that has been encoded with separate gain and shaperepresentations, said apparatus comprising: a first digital processingcircuit that is configured to estimate an accuracy measure of the shaperepresentation for a frequency band of the audio signal, and todetermine a gain correction based on the accuracy measure, wherein theshape representation encodes a shape vector comprising coefficients ofthe audio signal for the frequency band, and wherein the shape vectorhas been encoded using a pulse vector coding scheme where pulses may beadded on top of each other to form pulses of different height, and theaccuracy measure is based on the number of pulses used for encoding theshape vector and a height of the maximum pulse in the shaperepresentation; and a second digital processing circuit that isconfigured to adjust the gain representation for the frequency bandbased on the determined gain correction.
 9. The apparatus of claim 8,wherein the first digital processing circuit is further configured todetermine the gain correction in dependence on a position of thefrequency band relative to one or more defined frequency thresholds. 10.The apparatus of claim 8, wherein the first digital processing circuitis further configured to estimate a gain attenuation that depends on anallocated bit rate used for the shape representation, and wherein thefirst digital processing circuit is configured to determine the gaincorrection based on the estimated accuracy measure and the estimatedgain attenuation.
 11. The apparatus of claim 10, wherein the firstdigital processing circuit is configured to estimate the gainattenuation using a lookup table that associates different gainattenuations with different allocated bit rates or ranges of allocatedbit rates.
 12. The apparatus of claim 10, wherein the first digitalprocessing circuit is configured to estimate the accuracy measure from alookup table that associates different accuracy measures with differentnumbers of pulses and/or different heights of the maximum pulse, as usedfor the shape representation.
 13. The apparatus of claim 10, wherein thefirst digital processing circuit is configured to estimate the accuracymeasure from a linear function of the maximum pulse height and theallocated bit rate.
 14. The apparatus of claim 8, wherein the firstdigital processing circuit is configured to adapt the gain correction toa determined audio signal class of the audio signal.
 15. A decodercomprising the gain adjustment apparatus of claim
 8. 16. A network nodecomprising the decoder of claim 15.